Rank in Wordlist | Frequency | Word |
---|---|---|
1763 | 199 | 1,5 |
3678 | 101 | 1,2 |
3824 | 97 | 2,5 |
4302 | 87 | 1,1 |
4514 | 83 | 0,7 |
5226 | 72 | 1,3 |
5229 | 72 | 3,5 |
5290 | 71 | 1,4 |
5291 | 71 | 1,7 |
5708 | 66 | 0,5 |
Rank in Wordlist | Frequency | Word |
---|---|---|
18035 | 19 | 100% |
23519 | 14 | 30% |
34363 | 9 | 90% |
37721 | 8 | 10% |
37768 | 8 | 20% |
37807 | 8 | 50% |
37822 | 8 | 80% |
42215 | 7 | 5% |
47564 | 6 | 15% |
47688 | 6 | 75% |
Rank in Wordlist | Frequency | Word |
---|---|---|
11366 | 32 | S&P |
42544 | 7 | H&M |
57217 | 5 | S&D |
66366 | 4 | H&M:n |
87237 | 3 | Sannikka&Ukkola |
120097 | 2 | PG&E:n |
123951 | 2 | T&E |
174997 | 1 | AT&T |
174998 | 1 | AT&T:n |
174999 | 1 | AT&T:tä |
Rank in Wordlist | Frequency | Word |
---|---|---|
174714 | 1 | A$AP |
215585 | 1 | M$:n |
Rank in Wordlist | Frequency | Word |
---|---|---|
877 | 369 | ." |
Rank in Wordlist | Frequency | Word |
---|---|---|
25129 | 13 | Glasgow'ssa |
65160 | 4 | .' |
65681 | 4 | Andrew'n |
66337 | 4 | Glasgow'n |
68070 | 4 | O'Sullivan |
83009 | 3 | Heathrow'n |
85893 | 3 | O'Neill |
110885 | 2 | Captain's |
111159 | 2 | D'Andrean |
119152 | 2 | Nabab's |
Rank in Wordlist | Frequency | Word |
---|---|---|
15202 | 23 | 1+1 |
25050 | 13 | 2+2 |
31442 | 10 | 2+1 |
34282 | 9 | 1+2 |
47667 | 6 | 5+20 |
65161 | 4 | 0+2 |
80771 | 3 | 1+3 |
80772 | 3 | 1+4 |
80979 | 3 | 2+3 |
81142 | 3 | 3+4 |
Rank in Wordlist | Frequency | Word |
---|---|---|
92 | 2366 | https://www |
7418 | 51 | km/h |
15466 | 23 | ja/tai |
18547 | 19 | mg/l |
19012 | 18 | M/S |
24338 | 14 | m/s |
28953 | 11 | 24/7 |
40433 | 8 | mukaanhttps://www |
42337 | 7 | BBChttps://www |
44275 | 7 | https://twitter |
In the last subsection of this type we look for words containing other special characters: , ( ) % & $
" ' + * = / _
Depending on the language some of these characters may be allowed within words, other will not. If words with forbidden characters do not have very low frequency there might be a problem in preprocessing.
Words containing %:
select w_id-100,freq, word from words where w_id>100 and word like "%\%%" limit 10;
3.12.1 Words with Hyphens
3.12.2 Multiwords
3.12.3 (Multi-)Words with dots